-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make memorynew
intrinsic
#55913
base: master
Are you sure you want to change the base?
make memorynew
intrinsic
#55913
Conversation
@gbaraldi so with LLVM assertions enabled I'm getting
which is on the line that does |
I'd print everyone involved here with the way I showed you yesterday |
This now works! For simple examples like |
As an example of what is possible. Allocopt was able to go from define i64 @julia_f_769() #0 !dbg !5 {
top:
%pgcstack = call ptr @julia.get_pgcstack()
%current_task1 = getelementptr inbounds i8, ptr %pgcstack, i64 -112, !dbg !14
%memoryref_mem = call dereferenceable(40) ptr addrspace(10) @julia.gc_alloc_obj(ptr nonnull %current_task1, i64 40, ptr addrspace(10) addrspacecast (ptr @"+Core.GenericMemory#771.jit" to ptr addrspace(10))), !dbg !14
%0 = addrspacecast ptr addrspace(10) %memoryref_mem to ptr addrspace(11), !dbg !14
%1 = getelementptr inbounds { i64, ptr }, ptr addrspace(11) %0, i64 0, i32 1, !dbg !14
%2 = call nonnull ptr @julia.pointer_from_objref(ptr addrspace(11) %0) #4, !dbg !14
%3 = getelementptr inbounds i8, ptr %2, i64 16, !dbg !14
store ptr %3, ptr addrspace(11) %1, align 8, !dbg !14
store i64 3, ptr addrspace(11) %0, align 8, !dbg !14
%memoryref_data4 = call ptr addrspace(13) @julia.gc_loaded(ptr addrspace(10) %memoryref_mem, ptr %3), !dbg !15
store i64 2, ptr addrspace(13) %memoryref_data4, align 8, !dbg !15, !tbaa !20, !alias.scope !24, !noalias !27
%memoryref_data11 = getelementptr inbounds i8, ptr addrspace(13) %memoryref_data4, i64 8, !dbg !32
store i64 4, ptr addrspace(13) %memoryref_data11, align 8, !dbg !32, !tbaa !20, !alias.scope !24, !noalias !27
%memoryref_data18 = getelementptr inbounds i8, ptr addrspace(13) %memoryref_data4, i64 16, !dbg !34
store i64 5, ptr addrspace(13) %memoryref_data18, align 8, !dbg !34, !tbaa !20, !alias.scope !24, !noalias !27
ret i64 11, !dbg !36
} to. Removing the allocation. Which likely would allow it to just return the 11 define i64 @julia_f_769() #0 !dbg !5 {
top:
%memoryref_mem = alloca [40 x i8], align 16
%pgcstack = call ptr @julia.get_pgcstack()
%current_task1 = getelementptr inbounds i8, ptr %pgcstack, i64 -112, !dbg !14
call void @llvm.lifetime.start.p0(i64 40, ptr %memoryref_mem)
%0 = freeze [40 x i8] undef, !dbg !14
store [40 x i8] %0, ptr %memoryref_mem, align 1, !dbg !14
%1 = getelementptr inbounds { i64, ptr }, ptr %memoryref_mem, i64 0, i32 1, !dbg !14
%2 = getelementptr inbounds i8, ptr %memoryref_mem, i64 16, !dbg !14
store ptr %2, ptr %1, align 8, !dbg !14
store i64 3, ptr %memoryref_mem, align 8, !dbg !14
%memoryref_data4 = call ptr addrspace(13) @julia.gc_loaded(ptr addrspace(10) null, ptr %2), !dbg !15
store i64 2, ptr addrspace(13) %memoryref_data4, align 8, !dbg !15, !tbaa !20, !alias.scope !24, !noalias !27
%memoryref_data11 = getelementptr inbounds i8, ptr addrspace(13) %memoryref_data4, i64 8, !dbg !32
store i64 4, ptr addrspace(13) %memoryref_data11, align 8, !dbg !32, !tbaa !20, !alias.scope !24, !noalias !27
%memoryref_data18 = getelementptr inbounds i8, ptr addrspace(13) %memoryref_data4, i64 16, !dbg !34
store i64 5, ptr addrspace(13) %memoryref_data18, align 8, !dbg !34, !tbaa !20, !alias.scope !24, !noalias !27
ret i64 11, !dbg !36
} |
|
|
6222082
to
b65a483
Compare
b65a483
to
724b8c5
Compare
Can you please add an llvm pass test for #56030 (comment) (removing all memory for a simple case where the Memory object doesn't escape)? |
Do you want an actual LLVM pass, or can I just write a test for 0 allocations? |
I think an llvm test would be more robust, but probably a simple zero-allocation test would do the job as well. |
LOL. This test is so good it broke a doctest in performance tips. We're testing to show that you get allocations if you have "bad" code that allocates arrays, but now it doesn't allocate :laughing |
2c2b098
to
e6e26ab
Compare
This is now on top of #55995 (to figure out why we weren't optimizing correctly), but other than that, I think this is good to go! |
Maybe a test of no allocations in simple cases as discussed above? 🙂 |
Spooky! 👻 🎃 |
@aviatesk any idea why the effects tests are failing? |
ff6f042
to
79b6ef2
Compare
I guess this issue has been resolved by implementing the effects modeling for the new builtin? |
yeah, I hadn't realized how good the effects for this intrinsic are. |
actually CI thinks stuff is still very broken (which is weird given that the test passed locally) |
So currently this appears to only allow for stack allocation when the size of the |
But stack size is limited. Stack-allocating when the size isn't known at compile time seems dangerous? |
Could have runtime checks on the size. |
the dynamic version would be great, but would add an extra ~100 lines of coffee since you would have to make the basic blocks to handle both sides of the allocation |
Mhm. What I do with the if fits_in_slab(sz, slab)
p = alloca_like_path(sz, slab)
else
p = @noinline malloc_like_path(sz)
end with the cleanup phase at the end of the region then either resetting the slab, or |
So we can probably do something better if we can prove that the dynamically sized array doesn't escape. https://github.com/JuliaLang/julia/pull/52382/files started some work on this. We could for example have it be malloc, it being a stack allocation is difficult because VLAs are sketchy, so we would need to do something like allocate a small buffer and switch to malloc, the extra branch makes me think it should just be a malloc. |
103c575
to
e86f702
Compare
The doctest failure
is concerning, since a simple for loop is giving completely wrong results. The doctest failure about allocation disappearing is much more welcome instead. |
yeah. something in alloc_opt is lying about the lifetime of the memory leading it to be removed spuriously. Gabriel and I are looking into it. |
The Value printer LLVM uses just prints the kind of instruction so it just shows call. Update llvm-alloc-opt.cpp
no segfault empty mem optimization optimized! fix zeroinit it works It's alive! use checked rather than overflow fix error path sink ptls debugs fixes fix optimization typo cleanup and test mark some tests broken mark some as broken move boundscheck to julia and improve codegen fix-bootsrap off by 1 update overflow message switch wideint_t to overflowing operations address review Co-authored-by: Jameson Nash <[email protected]> Update src/codegen.cpp Co-authored-by: Jameson Nash <[email protected]> Update src/codegen.cpp Co-authored-by: Jameson Nash <[email protected]> Update src/codegen.cpp Co-authored-by: Jameson Nash <[email protected]> Update src/builtins.c Co-authored-by: Jameson Nash <[email protected]> Update src/codegen.cpp Co-authored-by: Jameson Nash <[email protected]> adress review fix tbaa fix issues add test fix build on clang fix
Co-authored-by: Jeff Bezanson <[email protected]>
Co-authored-by: Jeff Bezanson <[email protected]>
…mpiles down to a noop most of the time
6e05045
to
960e98d
Compare
This speeds up making new
Memory
s and allow the compiler to better understand what's going on, allowing for LLVM level escape analysis in some cases. There is more room to grow this (currently this only optimizes for fairly smallMemory
since bigger ones would require writing some more LLVM code, and we probably want a size limit on puttingMemory
on the stack to avoid stackoverflow. For larger ones, we could potentially inline thefree
so theMemory
doesn't have to be swept by the GC, etc.Benchmarks: